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1. Introduction 


The classical conjugate gradient method (CG) of Hestenes and Stiefel [23] — considered 
as an iterative method and used in conjunction with preconditioning (see e.g. [7]) — is 
one of the most powerful techniques for solving large linear systems 

Ax = b (1) 

with Hermitian positive definite coefficient matrix A. A considerable part of the research 
in numerical linear algebra since the mid-seventies has been devoted to generalizations of 
CG to indefinite and non-Hermitian matrices (see e.g. [39,36,35] for a survey). Almost 
exclusively, the focus was on real matrices, and the case of complex coefficient matrices 
A has received surprisingly little attention. Among the few exceptions, where complex 
matrices are included, are two recent papers by Faber and Manteuffel [13,14] (see also 
Joubert and Young [26]) in which they develop a theory on the existence of conjugate 
gradient type methods with certain desirable features adopted from classical CG. They 
showed that “nice” extensions of CG — with iterates defined by an error minimization or 
a Galerkin type condition over Krylov subspaces generated by A and computable by an 
^-term recursion — exist essentially only for matrices of the form 

A = e t6 (T + ierl) where T = T H is Hermitian , cr, 9 £ 1R . (2) 

Here I denotes the identity matrix. Actual implementations of such CG-type methods are 
well known for the real matrices in the class (2). Paige and Saunders [30] were the first to 
devise numerically stable algorithms (SYMMLQ and MINRES) for real symmetric, but in 
general indefinite A. Concus, Golub [6], and Widlund [44] found a Galerkin type method 
for the subclass of real nonsymmetric matrices 

A = I — N where N = — N T is real and skew-symmetric . (3) 

The first minimal residual type algorithm for (3) was proposed by Rapoport [33] (see also 
[11,18] for different implementations). 

In this paper, we present a detailed study, with the emphasis on practical aspects, 
of CG-type methods for arbitrary complex matrices of the form (2). Besides the two 
standard approaches based on a minimal residual property and a Galerkin condition, also 
a third less conventional method with iterates defined by an Euclidian, error minimization 
is considered. In particular, it is shown how SYMMLQ and MINRES can be extended to 
numerically stable implementations of all three approaches, and we derive error bounds. 
For the practical use of CG-type methods it is crucial that they can be combined with 
efficient preconditioners. Unfortunately, the more classical techniques, such as incomplete 
factorization, lead to preconditioned matrices which in general are no longer in the class 
(2). We show that this problem can be resolved and the special structure of the matrices 
(2) preserved by using polynomial preconditioning, and results on the optimal choice of the 
preconditioner are given. Note that polynomial preconditioning is an attractive approach 
for vector and parallel computers and, because of that, has become very popular in recent 
years (see [35] for a survey). 
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Finally, we remark that large linear systems with complex coefficient matrices of type 
(2) or of the more general form 

A = e t6 (T + icrD ) where T = T H is Hermitian , <r,9 € 1R , (4) 

where now D is a real positive semi-definite diagonal matrix, arise in important applica- 
tions. Partial differential equations which model dissipative processes (e.g. [32, Chapter 
10], [28]) usually involve complex coefficient functions and/or complex boundary conditions, 
and finite difference approximations lead to complex linear systems. A typical example is 
the complex Helmholtz equation 


—A u — cri u + ia^u — / (5) 

which describes the propagation of damped time-harmonic waves as e.g. electromagnetic 
waves in conducting media (e.g. [12, Chapter 8]). Further applications, which give rise 
to complex linear systems, include underwater acoustics [3,20,37], discretization of the 
time- dependent Schrodinger equation using implicit difference schemes [8], electromagnetic 
scattering problems [31], numerical computations in quantum chromodynamics [2], and 
numerical conformal mapping [43]. In these examples, the resulting matrices usually are 
of the form (4). Moreover, in almost all applications the case (2), D = /, occurs or D is at 
least close to I. Furthermore, T is typically indefinite, but has often, as for the discretized 
Helmholtz equation (5), only relatively few negative eigenvalues. 

The paper is organized as follows. In Section 2, we define the three different CG-type 
approaches for matrices (2) via certain optimality poperties of their iterates and establish 
some connections with the Hermitian Lanczos algorithm. Actual implementations of all 
three methods are then presented in Section 3. We also point out how these algorithms can 
be adapted to matrices of the family (4) for the case that D is a positive definite diagonal 
matrix. In Section 4, other implementations which have been proposed in the literature are 
briefly reviewed and operation counts for the various algorithms are given. In Section 5, we 
derive error bounds and present some new results on related constrained approximation 
problems. In Section 6, polynomial preconditioning is considered. Finally, in Section 
7, we report on some numerical experiments for matrices arising from finite difference 
approximations to the complex Helmholtz equation (2) with constant coefficients <7i , <r 2 . 

Throughout the paper, all vectors and matrices, unless stated otherwise, are assumed 
to be complex. We use the notation 

Kk(c, B ) := span {c, Be , . . . , B k ~ l c} 

for the fcth Krylov subspace of C n generated by c G C n an d the nxn matrix B. Moreover, 
(x,y) = y H x is the Euclidian inner product and ||a?|| = Vx H x the associated norm. For a 
Hermitian positive definite matrix D, ||x|[d = Vx H Dx denotes the D- norm of x. Finally, 
Amin(T) (resp. A mai (T)) is the smallest (resp. largest) eigenvalue of the Hermitian matrix 
T. 
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2. Three conjugate gradient type approaches for shifted Hermitian matrices 

We are concerned with the solution of complex linear systems (1) with n x n coefficient 
matrices of the form 

A = T + icrl where T = T H is Hermitian , a G IR . (6) 

Clearly, by multiplication of the right-hand side b or the unknown vector x by e~ l6 the 
more general case (2) can always be reduced to (6). Although our main interest is in 
non-Hermitian A, we include the case <7 = 0 and assume that A = T is nonsingular then. 
This guarantees that A is always nonsingular, and the exact solution of (1) is denoted by 
Z* := A -1 6. 

We consider three different CG-type appropaches for solving (1). For any given start- 
ing vector xo £ C n , all three methods compute a sequence of approximations Xk, k = 
1,2, . . ., to x* which — in exact arithmetic — would terminate after at most n steps with 
X*. The first two approaches are based on the usual Krylov subspaces Kk := K k (ro, A), 
k = 1,2, . . ., generated by A and the starting residual r 0 := b — Ax 0 . The iterates of the 
minimal residual (MR) method are characterized by 

||fe-Axfc||= min ||6 — Ax|| , x k £ x 0 + Kk . (7) 

The Galerkin (GAL) (or orthogonal error [14]) approach aims at computing approximations 
Xk satisfying 


(b — Axk,y) = 0 for all y £ Kk , x k £ x 0 + K k . (8) 

Note that, for Hermitian positive definite A, this method is equivalent to the classical CG 
algorithm (see e.g. [30]). While MR and GAL are standard approaches for non-Hermitian 
matrices, the third method we propose is less conventional. Its iterates are defined by the 
minimal Euclidian error (ME) property 

||x*-x fc ||= min ||x*-x|| , x k £ x 0 + K* k (9) 

xezo+K* 


where now 


Kt := K k (A H r 0 ,A) = span{ A H r„, A A H r 0 ,. . . , A 1 *' 1 A H r 0 } = A H K k (10) 

denotes the Krylov subspace generated by A and A h tq. Note that the last identity in (10) 
is true since matrices (6) are normal and thus 

AA h = A h A . (11) 

In the sequel, if it is not evident from the context which method we are considering, the 
superscripts MR, GAL, and ME will be used to distinguish iterates x k and the corresponding 
residual vectors r k := b — Ax k of the different approaches. 
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Remark 1. MINRES and SYMMLQ [30] are numerically stable implementations of the 
MR and GAL methods, respectively, for real symmetric matrices A. If A is indefinite, a 
Galerkin iterate satisfying (8) need not exist for every k, Paige and Saunders resolve this 
problem in SYMMLQ by actually working with a sequence of well-defined auxiliary vectors 
from which the existing Galerkin iterates can then be computed in a stable manner. 
Remark 2. The ME approach (9) is a generalization of Fridman’s method [17] for real 
symmetric matrices A. However, the algorithm he proposed is numerically unstable (see 
[18,40] for an explanation of the instability and a simple remedy). Fletcher [16] showed that 
the sequences of the Fridman iterates and the auxiliary vectors generated by SYMMLQ are 
mathematically equivalent. Therefore, as a by-product, SYMMLQ also yields a numerically 
stable implementation of Fridman’s method. 

We now turn to the derivation of algorithms, modelled after SYMMLQ and MINRES, 
for the actual computation of the iterates defined by (7)-(9). The main ingredient is the 
Lanczos algorithm [27] applied to the Hermitian part T of (6) and with t*o as starting 
vector: 

Algorithm 1 (Lanczos). 

0) Set u 0 = 0, /3i = ||r 0 ||, v = r 0 . 

For k = 1, 2, . . . 

1) I{ 1 3k =0, stop. 

Otherwise, compute 

2) v k = v/p k , 

v = Tv k - a k v k - PkVk-i with a k = (Tv k ,v k ) , 

Pk + 1 = IM| • 


We refer to [21, pp. 325] for a detailed discussion of the Lanczos algorithm; in particular, 
proofs of the properties collected in the following proposition can be found there. We set 


A 


Vk := (vi,v 2 ,...,v k ) and T k := 


0 


A 

a 2 



0 ••• 0 \ 


*. *. 0 

fik 

0 /3 k <*k / 


Proposition 1. a) In exact arithmetic, Algorithm 1 stops for k = m + 1 where 
m := dim K n (r 0 ,T). 
b) For k = 1,2, . . . ,m : 


V k H V k = Ik , the k x k identity matrix , (12) 

TV k = V k T k + f3 k+ iv k+1 el , e k '•= (0,...,0,1) T € lR k , v m+1 := 0 , (13) 

K k {r 0 ,T) = span{vi,i> 2 ,...,Vjfe} • (14) 
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Remark 3. The termination index m is just the minimal number of components in any 
expansion of tq into orthonormal eigenvectors of T, i.e. 


ro=£ plUl ’ 


1=1 (15) 

Pi ^ 0 , Tui = X t ui , (uj,ui)-6ji , \j ^ \i , l,j,= , 

and no such representation with a smaller m exists. 

From now on , we assume that k 6 {1,2,..., m}. Note that (14) is equivalent to 

K k = span{ui,v 2 ,...t>i} = {V k z | z € C*} , (14') 

since A and T differ only by a scalar multiple of the identity matrix. Moreover, with 


S k 


= (T k + i 
\ /?*+!' 


ialk ^ 


and by adding icrVk to both sides of (13), we obtain 

AVk=V k +iS k . 

Remark that, since A is nonsingular and V k has full column rank, (13') implies 

rank S k = k . 


(16) 


(13') 


(17) 


Next, we rewrite (7)-(9) in terms of V* and S k • := (1,0,. . . ,0) T denotes the first unit 

vector; its actual length will be clear from the context. 

Proposition 2. a) xj^ = xo + VkZ^f 11 where is the solution of 

/ 3 1 S?e 1 =S?SkZ . (18) 

b) = xo + A H V k zjf E where zj^ is the solution of 

/3ie 1 =S k S k z . (19) 

c) x^^ = x 0 + VfcZ®- 41 ' where z£ 7j4i is the solution of 

Piei = ( T k + icrl k )z . (20) 

Moreover, if a = 0 and T k is singular, then no Galerkin iterate satisfying (8) exists. 

d) x%* = xjf 1 = xZ* = z*. 

Proof. First, recall that ro = P\V\ and thus, by (12) 

Vf 1 r 0 =Pie 1 , j = 1,2, ...,m + 1 . (21) 
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Using (13') and (12), we obtain 


V?A a A Vk = S?Sk . (22) 

a) In view of (14'), x k = + VfeZfc where zj. is the solution of 

min ||ro — AV*.z|| or, equivalently, V^ 1 A H r 0 = V k I A H AV k z . (7') 

zeC* 

By (22), (13'), and (21) (for j = k + 1), the linear system in (7') is the same as (18). 

b) (10), (14'), and (11) imply that 

z* = *o + A H V k yk , r k = r 0 - A H A V k yk with y k £ C k . 

The minimization property (9) is equivalent to 

0 = (®* — Xk,A H v) = (rjfe,i;) for all v € Kk . 

By (14'), it suffices to consider these equations for v = vj, j — 1, . . . , k, and it follows that 
yk is the solution of 

V k H r 0 = V k H A H A V fey 

which, by (21) (for j = k) and (22), is just the linear system (19). 

For c), we similarly obtain that x k = ® 0 + V fel/fc satisfies (8) iff yk solves the linear system 

frei = V k H Vk+iS k y 

whose coefficient matrix, by (12) and (16), is T k + i<rlk . If a — 0 and T k is singular, the 
linear system (20) could have a solution only if it was consistent. Using the fact that T k - \ 
is nonsingular then and > 0, one easily verifies that this can not be the case, 
d) Part a) of Proposition 1 implies that K m is an invariant subspace for T and, since 
ro € K m , we conclude that 

- *o = A~ l r 0 G A -1 K m = K m = A H K m • 

On the other hand, x* trivially satisfies (7)-(9) and it follows that x m = ** for all three 
methods. ■ 


3. Practical implementations 

For the special case of real symmetric matrices A, the result of Proposition 2 is due 
to Paige and Saunders [30], and their derivation of SYMMLQ and MINRES is based 
on solving (18)-(20) by an LQ factorization of T k . Using similar ideas, it is possible to 
obtain practical implementations for the MR, ME, and GAL methods for the class of non- 
Hermitian matrices (6). In the following, we sketch such derivations. Details which are 
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completely analogous to the real symmetric case will be omitted, and we refer the reader 
to [30] instead. 

For the solution of the linear systems (18)-(20) a QR factorization of the k + 1 x k 
matrix (16) Sk of the type 

QkSk = ( ) (23) 

is used, where Q*isa& + lx&-fl unitary matrix and Rk a nonsingular matrix of the 
form 

€3 0 • • • 0 \ 

S 3 : 

73 0 

Cfc 

S k 

0 7/t / 

Note that Sk is tridiagonal and, by (17), has full rank, and a decomposition (23) clearly 
exists. Moreover, since 

S?S k = T 2 k +<T 2 Ik+0l +1 ekel 

is a real matrix, we can choose Qk such that Rk is real, and this will help to reduce 
complex arithmetic in the final algorithms. Using standard matrix calculus, one verifies 
that a factorization (23) with real Rk can be achieved with a unitary matrix Qk of the 
form 

Qk = Qk,k + 1 Rk Qk-l,k Rk- 1 “ • D 3 C?2,3 Q\,1 D\ 

with complex diagonal matrices 


/7i S 2 
0 72 


Rk = 


\0 • 


Dj = diag(l,...,l,e* v »',l,...,l) , <Pj € 1R , 

T 

j 


and real Givens rotations 


Qj,j + 1 - 


/I 



\ 


*- 3 

*-3+1 


cj,sj e 1R, c 2 + s 2 = 1 . 


\ 
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Moreover, the factorization is easily updated from the one of the previous step k — 1 by 
simply setting 

tk = Sk-2/3k , h = -Sfc-iQfc + Ck-\Ck-2^k COS^fc-j , 


hk = - 

-Sk-iCk- 2 pke upk ~ 1 + c k -i(a k - itr) 

, 7k = \hk\ , 

(24) 

e ivk = | 

hk/\hk\ if hk ^ 0 

0 if hk = 0 


and 

7k = 

iJil+Pl+i . e» = 7»/7* . 

= Pk+l/7k • 

(25) 

Note that 

«*=<?*,*+ J) 

# 

(26) 


Based on the factorization (23), the solutions of (18)-(20) and therefore, in view of 
Proposition 2, the actual iterates Xk of the MR, ME, and GAL methods can now be 
computed in a numerically stable manner. Moreover, by arranging the calculations in 
analogy to the real symmetric case [30] it is possible to obtain simple recursions for the 
x k‘ 

First, we consider the MR method. With (18) and (23), it follows that 

x k = x 0 + VkR^tk where t k := Pi(h 0) Qk^i € C fc . 

(26) shows that tk differs from tk-i only by its kth entry r]k and with pk denoting the last 
column of VkR^ 1 the recursion 

x k = Xk- 1 + rjkpk 

results (cf. [30]). In combination with Algorithm 1, this leads to the following implemen- 
tation. 

Algorithm 2 (MR method). 

0) Choose io € C" and set v ~ b — Ax o, Vo = Po = P- 1 = 0, 

Pi = Vi = IM|, Co = c-i = 1, so = s-i =<p 0 =0. 

For fc = 1,2,... 

1) If (3k = 0, stop: Xk-i solves Ax = b. 

Otherwise, compute 

2) v k = v/fik, o-k = ( Tvk,v k ), 

v = Tv k -ct k Vk - PkVk-i, Ph+i = IM|, 

and then ek, Sk, 7k, <Pk, Ck, $k using formulae (24), (25), 

3) Pk = (vk - &kPk-i — tkPk-i)hk, 

Xk = Xk - 1 + TjkPk with i Ik ~ CkVke 1 *", 
fjk+i = - Skfjke tv ’ fc . 


We now turn to the ME and GAL methods. With (23) and by setting 

W k = (w l ,W2,... ,w k ) := A H V k Rk l , 
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it follows from part b) of Proposition 2 that 

x 1 ^ = z 0 + W k y k where t/fc is the solution of /3 a ej = R^y . 
Similarly, using that, by (16), 


n + ((*) 

and with (23), (26), one deduces from (20) that x < £ AL exists iff c k ^ 0 and then 
xf AL = x 0 + W k y k , W*:=VfcQj_i diag(l,l,...,l,e^) , 

where y k is the solution of 


/3iei = R k diag(l,...,l, c fc )y . 

Clearly, y k and y k differ only in their last elements y*. and rj k . Moreover, with (13'), (23), 
and (26), one easily verifies that W k is identical to Wk up to its last column w k . Hence, 
we obtain the recursions 

X ¥ E = x k?i + VkWk and, if c k / 0, x^ = zJS + *?*«>*: (27) 

(cf. [30]). The resulting implementations can be summarized as follows: 

Algorithm 3 (ME/GAL method). 

0) Choose xo £ C" and set = i^ 41, = x 0 , v = 6 — Azo, uo = wo = 0, 

Pi = IMI> Vo — -1, c 0 = c_! = 1, s 0 = s_i = v? 0 = y_i = 0. 

Ifydl >0, set Uj = v//3j. 

For fc = 1,2, . . . 

1) IS (3k = 0, stop: = zjj^ = z* solves Ax = 6. 

Otherwise, compute 

2) a k = (Tv*,vfc), 

v = Tujfe - - ^jtVfc-iZ/Sfc+i = |M|, 

and then e k , 6 k , jk, Ik, <Pk, c k , &k using formulae (24), (25), 

3) w k - e" f>k (-Sk-iu>k-i + Ck-iVk) 

and, if 7 * ^ 0 and the Galerkin iterate is desired, 
x gal _ x me^ + fj k w k with rjk = -(6 k ij k - 1 + e*yfc- 2 )/ 7 *- 

4) Set = v/Pk+i, if (3 k+i > 0, and v k +i = 0 otherwise, 
w k = C k Wk + s k v k +i, 

X ME _ X ME + VkWk with Vk _ __(g kTjk _ 1 + e k T] k - 2 )/7k. 

The finite termination property of the Lanczos algorithm does no longer hold in the 
presence of roundoff error (see e.g. [21, pp. 332]), and the stopping criterion stated in 
Algorithms 2 and 3 is not useful in practice. Instead, one should terminate the iteration as 
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soon as ||rfc|| is sufficiently reduced. Note that, similar to the real symmetric case [30], | |r*fc ] ] 
can be obtained without computing the vector r k itself by using the following identities: 


Ikm = Jnl + iiL l + nl4 + > , 

|kf j4L !| = /3 k+1 \s k -iVk-i + Ck-ir} k e" fih \ . 

Finally, consider linear systems Ax = b with coefficient matrices A of the more general 
class (4) with D a positive definite diagonal matrix. Then, Ax = b is equivalent to the 
linear system 

A'x' = b' where A' = e~ ie D~ 1/2 AD~ 1/2 , x' = D 1/2 x , b' = e~ i9 D~ x l 2 b , 

whose coefficient matrix A ' is now of the form (6), so that we can use Algorithm 2 or 3 for 
its solution. Note that one never needs to form A' and b' explicitly, and it is straightforward 
to rewrite both Algorithms 2 and 3 in terms of the original linear system Ax = b. We 
omit the details and only state that the resulting MR, ME, and GAL algorithms generate 
iterates which are characterized by the properties (7) (with || || = || ||d-i), (9) (with 
|| || = || ||£>), and (8), respectively, where now 

K k = K k {D- 1 r 0 ,D- 1 A) , K* k = K k {D~ l A H rVo,^ 1 A) . 


4. Comparisons with other implementations. Operation counts 

Several authors [26,38,1] have proposed algorithms for the computation of the MR resp. 
GAL iterates (7) resp. (8). However, most of these implementations (like Orthomin 
and Orthores in [26]) are modelled after variants of the conjugate residual or conjugate 
gradient algorithm for Hermitian positive definite matrices. It is well known [30,5,40] that, 
for Hermitian indefinite A, these approaches are numerically unstable and can even break 
down; e.g. for the GAL method this occurs whenever a Galerkin iterate does not exist 
(cf. [30] and part c) of Proposition 2.). The same stability problems can arise for the non- 
Hermitian matrices (6) if a is small. Hence, all these algorithms derived directly from the 
positive definite case are stable only for matrices (6) which fulfill additional requirements 
such as T positive definite or |<r| bounded away from 0. Note that these two conditions 
are not satisfied for most of the applications mentioned in the introduction. 

Here, we consider only implementations which are numerically stable for the general 
class of matrices (6). Among the proposed algorithms in the literature merely the Orthodir 
approach [26,1] for the computation of the MR iterates has this property. This algorithm 
can be stated as follows. 
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Algorithm 4 (Orthodir MR implementation). 

0) Choose xo E C 1 and set so = ro = b — Ax o, 
qo = As 0 , s- 1 = 9-1 = 0, i/o = 0. 

For k = 0,1,... 

1) If qk =0, stop: Xk solves Ax = h. 

Otherwise, compute 

2) A k = (r fc ,9fc)/||9fc|| 2 , 

Sfc+i = Xk + AfcSfc, r k + 1 = rk - Ajfc^it, 

3) p k = {Tq k ,qk)/\\qk\\ 2 and, if k > 0, u k = | kfc | [ 2 / 1 || 2 , 

= qk — (pk + i<r)sk — VkSk- 1 , 

qk+i = Tqk — pkqk — vkqk-i- 


We remark that qk = .As*; and that the search directions Sk are up to scalar factors identical 
to the vectors pk in Algorithm 2. 

Next, the results of operation counts for Algorithms 2,3,4 are presented in Table 4.1. 
Although we solve complex linear systems, most of the scalars (like a* and (3k in the 
Lanczos step of Algorithm 2 and 3) occuring in the computations are real. Moreover, on 
some machines, implementations in real arithmetic are more advantageous. Therefore, we 
compare work and storage in terms of real quantities. Listed are the number of matrix- 
vector products T -v, v E IR n , the approximate number m of additional real multiplications 
per iteration, and the number s of real vectors (of length n) to be stored. The computation 
of inner products often constitutes a bottle-neck on modern computers. For this reason, 
we also give the number dp of dot products x • y, x,y 6 H n per iteration. Finally, notice 
that — based on the simple observation stated in Proposition 3 below — work and storage 
for the MR and ME/GAL methods can be significantly reduced if the Hermitian part T of 
the matrix (6) is real. This case occurs frequently in the cited application, and we included 
the corresponding operation counts in Table 4.1. 

Proposition 3. Let T be real and assume that r o := b — Ax o E jR n . Then, all the vectors 
Vk, k = 0,1,..., in Algorithm 2 and 3 are real. In addition, for the MR method, all search 
directions pk are real vectors. 

Note that often the right-hand side b is a real vector, and then the standard starting 
guess x 0 = 0 gurantees that r 0 is real. In the general case b & C n and if a ^ 0, the 
condition ro E lR n can always be fulfilled by choosing the starting vector xo = 
appropriately, e.g. x^ = 0 and x^ = Im b/a. However, such a strategy might not be 
desirable, if one already knows a good approximation xq for the exact solution of Ax = 6. 


Table 4.1 


To explain the numbers given in Table 4.1, a few more comments are necessary. For 
the ME/ GAL algorithm, we have assumed that the Galerkin is, if desired, only computed in 
the very last step of the iteration. Furthermore, in order to reduce the computational work, 
note that, in the MR Algorithm 2, one computes the vector 'fkPk instead of pk- Similarly, 
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in part 4) of Algorithm 3, the vector Wk itself is never needed and, hence, rjkWk is generated 
directly. Moreover, using fast Givens rotations (e.g. [21, p.158]), we compute the rescaled 
vector fkWk instead of Wk in step 3) of Algorithm 3. Here, fk := l/(ck-i cos <pk) for the 
case that Sk-i < Ck-i and |siny?fc| < j cos 1, and fk is defined correspondingly for the 
remaining cases. Note that then only 4n real multiplications are needed for updating fkWk 
from fk-iWk-i and v*. 

We conclude this section with a few further remarks. First, Table 4.1 clearly shows 
that the MR implementation stated in Algorithm 2 is less expensive than the Orthodir 
Algorithm 4. For real symmetric linear systems, Algorithm 2 and 3 reduce to MINRES 
and SYMMLQ [30], respectively. Notice that, for the case of complex matrices (6) with 
T and ro real, Algorithm 2 and 3 require only little extra work and storage compared to 
MINRES and SYMMLQ. Finally, consider real linear systems with matrices (3) A = J — N 
with N — —N t (or, equivalently, A' = iA = T + il with T = — iN = T H if rewritten 
in the form (6)). It can be shown, that for this case, the Galerkin part of Algorithm 3 
is equivalent to the method of Concus, Golub [6], and Widlund [44]. Also, note that, in 
[18,39], we have investigated an Orthodir type implementation of the ME approach for the 
class A = / — N . 


5. Error bounds 

In this section, we derive error bounds for the MR and ME methods. Let a < \min{T) and 
/? > A max (Y) be given bounds for the extreme eigenvalues of T. Therefore, all eigenvalues of 
A are contained in the complex line segment S := [a + icr, fl + i<r]. For the rest of the paper, 
we assume that in the Hermitian case cr = 0, A — T is positive definite and 0 < a < (3. 
This guarantees that 0^5. We denote by H* the set of all complex polynomials of degree 
at most k. 

By the standard technique, using 

Kk{r 0 ,A) = {q{A)r 0 \ q G H*^} (28) 

and the expansion (15) of 7*o (note that A and T have the same eigenvectors!), one obtains 
from (7) the estimate 


Ikfc^ll/IMI < min max |1 - A?(A)| . • 

qZzllk-i a£S 

Similarly, with (10), (28), we deduce from (9) that 

llx.-xril/ll^-xoll < min max |1 - |A| 2 q(A)| . 

q€Uk-i Aes 

With the linear transformation 

/ ^ - A) + 13 + a 

z = 2(A) = , 


(29) 


(30) 


( 31 ) 
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which maps S onto the unit interval [—1,1], the right-hand side of (29) can be rewritten 
in the form 


(Ek(a) :=) min max \p(z)\ 

V V ' p€ll*:p(a)=l *€[-1,1] 


(32) 


where 


2«r + 0 + a ^ r , 
a — 0 [ 1, 1] 

p — a 


(33) 


Furthermore, using the identity 

4 |A| 2 = (/? - a) 2 (z(A) - a)(z(A) - a) , A 6 S , 
one easily verifies that the upper bound in (30) is just F/^j(a) where 

(E[ r \a) : =) min max |p(z)| , 

v k w i>en fc (a) *€[-i,i] (34) 

11* (a) := {p € 11*. | p(a) = p(a) = 1 and, if a € IR, p'(cl) = 0} . 

We now turn our attention to the two approximation problems (32) and (34). It will 
be convenient to represent a in the form 


a = a(ip) = a(R) cos ij> + ib(R) sin ip , J2>1 , 0 <ip < 2w , (35) 

«(*):= i(fl+i) , K JJ)-l(B-l) ; 

clearly, this is possible for any a £ [—1,1]- For fixed R > 1, we set £r = {a = a(V>)|0 < 
ip < 27t} and remark that Sr describes the boundary of an ellipse with foci at ±1 and 
semi-axes a(R ), b(R). 

First, we consider the complex approximation problem (32). Its solution is classical 
for the case of real a where Tk(z)/Tk(a) is the optimal polynomial. Here, T* denotes the 
fcth Chebyshev polynomial which, by means of the Joukowsky map, is given by 

nw = l(r* + ^) , * = i(» + i) . (36) 

For purely imaginary a, the extremal polynomials were found by Freund and Ruscheweyh 
[19], but for general complex a the solution of (32) is not explicitly known. We now derive 
a new, very useful upper bound for the optimal value of (32). 

Theorem 1. Let R > 1 and k = 1,2, . . . . Then, 

R* < Efc ( a) - R* + l/Rk ’ aeSR • (37) 

Proof. The lower bound follows immediately from an inequality due to S.N. Bernstein (e.g. 
[29, Theorem 74]). In order to obtain the upper bound, we define the polynomial 

_ (. R k -l/R k )T k (z) + 2ism(kiP ) 

P{Z) (R k -l/R k )T k (a) + 2isin(kiP) ’ 1 ’ 
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where ip is given by the representation (35) of a. Clearly, p(a) = 1 , and, by using (35), 
(36), and the fact that —1 < Tk(z) < 1 on [— 1 , 1 ], one readily verifies that the uniform 
norm of p on [— 1 , 1 ] is just the stated upper bound. ■ 

Remark 4. For fixed R > 1 , the upper bound in (37) is optimal, with equality holding 
for the two real points of Sr. The optimal lower bound is unknown, but it is conjectured 
to be 2 /(R k + R k ~ 2 ) which is just the optimal value of (32) for the two purely imaginary 
points of Sr (cf. [19]). 

Remark 5. The polynomials (38) were recently introduced by Fischer and Freund [15], 
and it was shown that they are the optimal solutions for certain constrained approximation 
problems on ellipses. 

Next, we study the approximation problem (34), and we will show that it is closely 
related to the classical Zolotarev problem 

min max \z k + r)kz k ~ l — q{z)\ , rj £ 1R , k = 2,3,... . (39) 

?en fc _ 2 ze[- 1,1] 

It is well known that there always exists a unique best approximation qk(z; rj) for (39) and 
the corresponding polynomials 

Z k(z m ,v) = Z k +7)kz k ~ 1 - qk(z',7]) , 7/ G 1R , k = 2,3,... , 

are called Zolotarev polynomials. We refer the reader to [4] for a detailed study of these 
polynomials. Note that 

Z t (z;,)=2-‘(l + |,|)‘r i (^i5.) for , (40) 

and for the remaining values of 77 there are representations of Zk{z’,rj) in terms of elliptic 
functions. 

Theorem 2. Let a = a(ip) £ Sr, R> 1, k = 2, 3, . . . . Then, there exists a unique optimal 
polynomial pk(z; a) for (34). If ip = jrr/(k — 1) with an integer j ^ 0 mod k — 1, then 

, A Tk-i(z) / r ) 2 

pk(z and E k ( a )- R k-i + 1 / R k-i • 


Otherwise, 


Pk(z;a ) = 


Z k(z;r] ) 


z k(a;v) 

where rj = 77 (a) is the unique solution of 

lmZk(a;q) = 0 ( resp . Z f k (a;r]) = 0 , if a £ 1R) , 77 £ JR . 

In particular, if ip satisfies for some integer j ^ 0 mod k 


jit 77 sin 2 *r- 

cos ip = cos — — 

k a(R) + sign 77 cos 


7T 


with I 77 1 < tan 2 — — , 77 £ ]R 

2k 


(41) 


(42) 


(43) 
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then 


(44) 


?*(*;<») = 


with p defined by 


r ‘Q 

\v\ 


and E^\a) 


HR? 


p k + 1 / p k 


2 ^ p ^ ^ ^ 1 + M a(iZ) + sign 77 cos ^r- 


P> 1 


(45) 


Proof. Writings £ Ilfc(a) in the form p(z) = 1 — (z — a)(z — a)q(z), q £ IIfc_ 2 , one recognizes 
(34) as a linear Chebyshev approximation problem, for which, since a $ [— 1 , 1 ], Haar’s 
condition is satisfied. Standard results from approximation theory (see e.g. [29]) guarantee 
that there always exists a unique optimal polynomial p k (z;a) for (34). Moreover, because 
of the sy mm etry of the problem with respect to the real axis, pk is a real polynomial, 
and pk is charcterized by assuming its maximum absolute value at at least k points in 
[—1,1] with alternating signs. This alternation property implies that pk has degree k — 1 
or k. First, consider the case k — 1 . Since the scalar multiples of T k -i are the only 
polynomials of degree k — 1 with an alternating set of length at least k, we conclude that 
Pk(z;a) = Tk-i(z)/Tk~i(a), and, in view of pk £ n*(a), this case occurs iff Tk-i(a) £ 1R 
and a ^ H. With (35), (36), one readily verifies that these are just the points a = a(ip) 
with ip = jir(k — 1), j / 0 mod k — 1. Now we turn to the case that pk is of degree k. Since 
the optimal polynomials for the Zolotarev problem (39) are characterized by the same 
alternating property as pk, it follows that pk is of the form (41) with a suitable 77 £ 1R. In 
order to guarantee pk £ Iljfc(a), 77 must be the solution of (42). 

Now, let 77 £ 1R, [ 77 1 < tan 2 With (40) and (36), we conclude that a satisfies (42) iff 


/- ^ a + V 1, , j* , *, 1 x •_ 3* 

<“ : => TTm = 2 (p + p y cos T + 2 (p - p )sm T 


(46) 


for some p > 1 and some integer j 7 ^ 0 mod k. By using the representation (35) of a and 
by equating the real (resp. imaginary) parts of (46), one arrives at two real nonlinear 
equations for the unknowns cosip and p, and a straightforward, but lenghty calculation 
shows that the solutions are given by (43) and (45). Finally, note that the first identity 
in (44) is a consequence of (41) and (40); the second one follows from .E^(a) = l/|Tfc(d)| 
and (46). ■ 

For general a, (41) and (42) lead to rather complicated and not very useful formulae 
for E^\a) in terms of elliptic integrals. Next, we derive simple bounds for this quantity. 


Theorem 3. Let R > 1 and k = 2,3, . . . . Then, for a = a{ip) £ Sr 

2 E (r )( , bk-imh-iWi+bkmfkin 

R k + 1/R k - k { 1 - b 2k ^(R) + b 1 (R)f 2k -i(i’) 


(=: B[ r) (ap 


( 47 ) 


where 


&,(*) = \(R’ - jj) , fM) = 


{ sin{jip)/ sin ip 
(-1 
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if sin ip 7 ^ 0 
if ip = hr 



Both bounds in (47) are attained if ip = jit /k, j ^ Omod A:. In addition, the upper estimate 
is sharp for ip = jn/(k — 1 ), j ^ Omodfc — 1 . 

Proof. Duffin and Schaeffer [9] showed that for any real polynomial of degree at most k, 
|p(z)| < M on [—1,1] implies |p(a)| < M(R k + l/R k )/2 for all a £ £r. Application of this 
result to pk(z;a) yields the lower bound in (47). In order to obtain the upper bound, we 
consider polynomials p(z) = 'yTk(z) + STk-i(z) 6 Ilfc(a) with 7 , 6 £ IR. With (35) and 
(36), one readily verifies that p £ Ilfc(a) iff 7 and S satisfy 

/ (R k + l/R k )cos(kip) (Rb- 1 +l/f 2 fc - 1 )cos(fc-l)V>W 7 \ / 2\ 

\ (iJ* - ~ \ 0 / 

A routine calculation shows that this linear system has a unique solution and that 

max |p(z)| = hi+|tf[ = #i r) (a) . 

z€[-l,l] 

Finally, the statements on the sharpness of (47) follow from Theorem 2. ■ 

Note that the bounds in (47) are asymptotically optimal, and we have the following 

Corollary 1. Let R > 1 and a =£ £r. Then, 

Jim (4 %))^ = Jim = 1 . 

K—>00 K —+ OO It 

The typical behavior of the optimal values of (32) and (34) and the bounds stated in 
Theorems 1 and 3 is illustrated in Fig. 5.1. For fixed R = 1.103 . . . and k = 30, the four 
curves 

Me) < Rt + y. gt - < 4 r) M < B k \«) . « = € £j i , 0 < v < >r/2 

are plotted. Note that Ek(a) = Ek(a ) = Ek(— a) (and analogous for E^\a)), and hence 
it suffices to consider only the points a in the first quadrant. 


Fig. 5.1 


The foilwing theorem summarizes our results on error bounds for the MR and ME 
methods. For the special case of matrices A = T + icrl with positive definite Hermitian 
part T, we also derive an error bound for the GAL method. 

Theorem 4. Let a < r> (T) and fd > A max (T) be given, and assume that 0 < a < /3 if 

<7 = 0. Let a be given by (33), and let R be the unique solution of 


1 1 y/(3 2 + cr 2 + s/a? + <r 2 

2 ( R+ R>~ f3 — a 


R> 1 


(48) 
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Then, for k = 1, . , 

a ) 


b) 


Wb-Ax^U 


< E k {a ) < 


||6-^o|| " “" vw/ " R k +1/R k 

•**- a! fc 4B ll ^ =.(»■) /.X ^ n (r) 


T^l - *- (a) - ™ 

c) 1ST is positive definite, then 


(49) 


(50) 


ll*.-*r-Hr 
H** - *0 It 


< 


\ 


^~7k) 2 

4<t 2 + a 2 (^/K + ^) 2 R k + 1/R k 


1 + 


(51) 


where k = fi/ a. 

Proof of part c). We set e k = x* — and fij = (T*e k ,e k ), j = —1,0,1. With (6) and 
since r k = .Ae*, one obtains 

(r k ,e k ) = .p 1 +iap 0 and ||r fc ||r_ x = pi + (r 2 . (52) 

Now let a£io + K k be arbitrary. By (8), (r k ,u — x k ) = 0, and therefore 

(nfc,e*) = (r fc ,s* -u) = {T~ 1/2 T k ,T 1,2 {x ir - u)) . (53) 

By application of the Cauchy-Schwartz inequality to (53) and with (52), we arrive at 


p\+(T 2 pl < II** — «||r (a*i +<r 2 fi-i) • 

Next, recall that, by the Kantorovich inequality (e.g. [26, p. 83]), 


Vi/i-i < A*o wliere 3 := Q(v / «+ 


-l 


(54) 


(55) 


Using (55) and the estimate > A m j n (T 2 ) = a 2 , we obtain from (54) 


^ l < ||** - u||t 


1 + (T 2 p-i/p! 


< ||** -u\\ 2 t 


a 2 + a 2 


l + a 2 s 2 (j,-i/pi " * " J a 2 + er 2 s 2 
Since u E zo + K k is arbitrary, ||x* — u||t in (56) can be replaced by 

min ||**-u||t = min ||p(^)e 0 ||r • 

u£zo+Kk p€llfc:p(0)=l 


(56) 


(57) 


By expanding eo into orthonormal eigenvectors of T (cf. (15)) and with (31), (32), (33), 
and (37), we obtain 


IW^oIIt < ||co||t -B fc(a) < I l e o | |t pfc - /pfc 

P en fc : P (o)=i K* + 1 /R* 


(58) 
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Finally, combining (56) - (58) yields the desired bound (51). ■ 

Remark 6. For the special case of a = 0, (51) and (49) reduce to the usual error bounds 
(see e.g. [39]) for the classical conjugate gradient and conjugate residual algorithms. 
Remark 7. We excluded Hermitian indefinite matrices A — T. Error bounds for this case 
can be found in Chandra [5] for the MR method and in [40,18,41] for the ME method. 
Remark 8. For the GAL method there are no satisfactory error bounds for the general 
class of matrices (6). 


6. Polynomial preconditioning 

Polynomial preconditioning aims at speeding up the convergence of conjugate gradient 
type methods for the solution of Ax — b by applying them to one of the two equivalent 
linear systems 

s(A)Ax = s(A)6 (59) 

(left preconditioning), or 

s(A)Ay = b , * = s(A)t/ (60) 

(right preconditioning). Here s is a suitably chosen polynomial of small degree. For 
the case of Hermitian positive definite A, the idea goes back to Rutishauser [34] who 
proposed polynomial preconditioning in the fifties as a remedy for roundoff in the classical 
CG algorithm. The recent revival [25] of Rutishauser’s method and the general interest in 
polynomial preconditioning is mainly motivated by the attractive features of this technique 
for vector and parallel computers (see [35] for a survey). 

In this section, we study polynomial preconditioning for the class of matrices (6) 
A — T + ial. Let / > 2 be any fixed integer. We seek a polynomial s £ H/_i with the 
following two properties: 

(i) the coefficient matrix s(A)A of (59) and (60) is again a shifted Hermitian matrix of 
the form (6) and 

(ii) the convergence of conjugate gradient type methods, applied to the preconditioned 
systems (59) or (60), is speeded up optimally. 

As in the previous section, let a.,f3 £ 1R be given such that 

a < fi < (3 for all eigenvalues fi of T , (61) 

and assume that 0 < a < /? if a — 0. Our criteria for optimal convergence in (ii) will 
be based on (61) as the only available information on the spectrum A and on the error 
bounds stated in Theorem 4. 

First, consider the requirement (i). For any s £ Hj-i, we can represent s(A)A in the 

form 

s(A)A = (T + iaI)s(T + ial ) = q(T) + irl (62) 

with q £ Hj and r £ IR. Note that s, q, and r is equivalent to 

(/x + i<r)s(n + ia) = q(fi) + ir and r := iq(—i<r) . (63) 
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Since q{T) is Hermitian iff q is a real polynomial, it follows from (62) that (i) is fulfilled 
iff q € Ilj r ^ := { 11/ | q has real coefficients } and r £ 1R. Therefore, from now on, it is 
assumed that s £ Dj-i satisfies (63) with q £ nj r ^ and r £ IR. 

Next, we turn to the question of optimal choice of q and r. A first, very tempting 
strategy is to require r = 0 and to choose q such that s(A)A = q(T) is positive definite. 
The preconditioned system (59) can then be solved by the standard CG method. Clearly, 
q{T) « I should approximate the identity matrix as best as possible. Using (61) and (63), 
we conclude that such an optimal q is given as the best approximation in 

min max |1 - q(/x)| . (64) 

9 en 1 (T) : 9 (-»a)=o #*£[“.^1 

For positive definite matrices A = T, this approach just leads to Rutishauser’s method [34]. 
For the non-Hermitian case <r / 0, (64) turns out to be equivalent to the approximation 
problem (34), and we have the following 

Theorem 5. Let a 0 and l > 2. Then, there exists a unique best approximation in 
(64) given by 


= i - pi 


( 


f3 + a — 2p 
(3 -a 



/3 + a + 2 itr 
f3 — a 


(65) 


where pi(z’,a) is the extremal polynomial of (34) (for k — l) with optimal value E^ r \a) 
(cf. Theorem 2). Moreover, the matrix s(A)A = q(T) is positive definite with eigenvalues 

in [1-Ef r) (a),l + i?[ r )(a)], and for the iterates Xk of the CG method, applied to (59), the 
estimates 


Ik* - Sfcllg(T) < 2 

ll**-*o|| g (T) “ R k + 1/R k 

hold. 


. l + vi^c bFW 


, ( 66 ) 


Proof. The linear transformation z(p) = (/? + a — 2 p)/((3 — a) maps [ot,f3] onto [—1,1]. 
Moreover, p(z(p)) = 1 — q(p) defines a one-to-one correspondence between all q £ Il[^ 
with q(— icr) = 0 and all real polynomials p £ IU(a). This shows that (64) and (34) are 
equivalent (recall that the optimal polynomial for (34) is real), and, hence, q* is indeed 
the unique best approximation in (64). The error bounds (66) follow from (51) and (48) 
(with <7 = 0, a = 1 — Fj r \a), and (3 = 1 + .£^( 0 )). ■ 

Recall (see Fig. 5.1) that for fixed l of moderate size and fixed R, E^ r \a) strongly 
depends on the position of a on the ellipse Sr. In particular, if a is close to the real points 
of the ellipse, E^ r \a) is significantly larger than for the other points of Sr. Therefore, (66) 
suggests that the polynomial (65) will yield a poor preconditioner for matrices A which are 
nearly Hermitian positive definite. This will be confirmed by numerical results presented 
in the next section. Therefore, in order to obtain a polynomial preconditioner which is 
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satisfactory for all a £ £ r , it is crucial to treat r in (62) as a free parameter, and, next, we 
determine optimal choices of p and r for speeding up the MR and ME algorithms. 

First, consider the MR method. Here right preconditioning (60) is the more natural 
choice between (59) and (60), since residual vectors for (60) are also residual vectors of 
the original linear system. Let yk denote the A:th iterate of the MR algorithm applied to 
s(A)Ay = b , and set x pp = s(A)yk . Moreover, let Xk be the fcth approximation generated 
by the MR method applied to the original system Ax = b. Then, assuming that xo = x pp , 
it follows with (60) that Kk(s(A)r 0 ,s(A)A) C Kki(ro, A) and x pp ,xui 6 ®o + Kki(ro,A). 
Hence, the minimization property (7) implies that ||6 — Ax^i || < ||6 — Ax pp \\. Therefore, 
in view of (49), we conclude that, based on (61) as the only information on the spectrum 
of A, the best possible choice of s £ H/_i is one which guarantees the estimates 


llt-A.f’ll < 2 

||4-Xx,|| “ R tl + 1/R kl 


k = 1 , 2 ,.. 


(67) 


with R defined in (48). We call s £ H/_i an optimal polynomial preconditioner for the 
MR algorithm if it leads to the error bounds (67). 

Similarly, for the ME method with left polynomial preconditioning (59), the error 
bounds (50) and Corollary 1 suggest that the best possible choice of s £ Hj-i is one for 
which the iterates x pp satisfy 


\ x *~ x k P \ 
|x* — aJo | 



A: = 1,2,... 


( 68 ) 


for some a £ Sri . A polynomial s £ H/_i is called an optimal preconditioner for the ME 
approach if it guarantees (68). 

With this notion of optimality, we can now state the main result of this section as 
follows. 


Theorem 6. Let l > 2. Then, 


. qi(X-itr)+ir 

-»/— 1( A) = (69) 

where 

qi(p) = TA 2 - ^ - ~ 13 ~ Q ) - R eTi(-a) and t = -ImT^-a) , (70) 

p — a 

is an optimal polynomial preconditioner for the MR and ME methods. Here, Ti denotes 
the Ith Chebyshev polynomial (cf. (36)) and a is given in (65). 

Proof. First, note that, by (70), qi(—i(r) = — it, and thus ( 69 ) defines indeed a polynomial 
s £ H/_i. Next, consider the preconditioned matrix A = s(A)A. With ( 61 ) and since T) 
maps the interval [— 1 , 1 ] onto itself, it follows that the eigenvalues of the Hermitian part 
qi(T) of A are contained in [S,/3] where a := — 1 — Re Ti(— a) and /? := 1 — R eTi(-a). Now 
we apply Theorem 4 (with a = a, (3 — f3, and <r = r) and note that, by (35) and (36), 


- $ + a + 2ir 

a = ■= ; = -Ti(-a) € t R i 

p — a 
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The error bounds (67) and (68) are then an immediate consequence of parts a) and b), 
respectively, of Theorem 4. Hence s/_i is an optimal polynomial preconditioner, and the 
proof is complete. ■ 

Remark 9. In [10], Eiermann, Li, and Varga developed a general theory for polynomial 
preconditioning for asymptotically optimal semi-iterative methods. In particular, by means 
of Theorem 6 from [10], one can show that the polynomial preconditioner (69) is also best 
possible for semi-iterative procedures for the class of matrices (6). 

Remark 10. For the GAL approach, there are in general no error bounds on which we 
could base the choice of a best possible polynomial s. However, in analogy to the case 
of real symmetric matrices (see [42,40,41]), preconditioning for the GAL method can be 
motivated by its close connection (cf. (27)) to the ME algorithm. Therefore, we regard 
(69) also as an optimal polynomial preconditioner for the GAL method. 

Finally, note that polynomial preconditioning is very easily incorporated into the MR 
and ME/GAL Algorithms 2 and 3. Right preconditioning leads to slightly more economical 
implementations, and only this choice is considered in the sequel. The idea is to apply the 
CG type methods to the linear system sj_i(A)Ay = b — Axo with starting guess yo = 0. 
The resulting iterates y k of the MR and ME/GAL approaches are generated by Algorithm 
2 and 3, respectively, modified in the following way: substitute y k for x k , replace in (24) 
<t by r (defined in (70)), and, finally, perform in 2) the following Lanczos step 


v = z (fc) -a kVk-fikVk-i where := Tj(— - — T- ^ + * l)v k , a k := (z (fc \vfc) , (71) 

p — a p — a 

and set a k — a k — Re Xj(— a). We remark that for this computation only T), but never 
the complex polynomial (69), is used. The actual preconditioner sj_i appears only in the 
translation of the y k into the corresponding iterates 

s* = + si-i(A)y k (72) 

for the original system Ax = b. However, we do not need to generate x k in each step. 
Note that the norm ||r*|] of the residual r k = b — Ax k is available (cf. Section 3) from 
the procedure generating y k , and the iteration is stopped as soon as ||rjfej| is sufficiently 
reduced. Hence, x k is computed only once, namely in the very last step of the algorithm. 

Finally, notice that z ^ in (71) can be obtained by performing l steps of the classical 
Chebyshev semi-iterative method (see Golub and Varga [22]). More precisely, setting 



the three-term recurrence formula of the Chebyshev polynomials leads 


2 

/? — a 
to the following 
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Algorithm 5 (Computation of z ^ in (71)). 

0) Set — Vk and z[ k ^ =±= u> T'v k - 

1 ) For j — 2, . . . , l, compute 
zf ] = 2 u>T'z {k \ - z (k \. 

2) Set z<*> = z\ k) . 

Remark 11. The computation of z ^ via Algorithm 5 requires 21 matrix- vector products 
T ■ v, v € Ift n , and 21 additional real multiplications. If T and r 0 are real (cf. Section 4), 

all z^ are real too, and the work is halved. 

Similarly, using (69), (70), and again the three-term recurrence formula of the Cheby- 
shev polynomials, a routine calculation shows that the following algorithm just yields the 
iterate (72). 

Algorithm 6 (Computation of x k in (72)). 

0) Set h = I/*, and h [ k ) = 2u(T'yk - + i<r)yk)- 

1 ) For j = 2, — 1, compute 

h (k) = 2wT'h ( j k } 1 - h\ k \ + 2Tj(—a)y k . 

2) Set Xk = x 0 


7. Numerical experiments 

We have performed numerical experiments with all algorithms considered in this paper 
in numerous cases. Mostly, linear systems arising from the complex Helmholtz equation 
(5) were used as test problems. In this section, we present a few typical results of these 
experiments. 

Consider (5) on the unit square (0,1) x (0,1) with Dirichlet boundary conditions, 
and assume that Oi , <r 2 € 1R are constant coefficients. Then, approximating (5) by finite 
differences on a uniform m x m grid with mesh size h = 1 /(m + 1) leads to a linear system 
with coefficient matrix 

A = T - f- ial , T A 0 — <rih 2 I , cr := a 2 h 2 . . (74) 

Here A 0 is the symmetric positive definite matrix arising from the usual five-point dis- 
cretization of —A. Note that the eigenvalues of Ao are known, and for our experiments 
with polynomial preconditioning we have used the true values 

a = Amin^o) - a^h 2 , /3 = A max (A 0 ) - <r\h 2 (75) 

of the extreme eigenvalues of T (cf. (61)). For the constants in (74), values of the form 
(Tj = <7 j (V 1 ), 02 == 02 (VO were chosen. Here 0 < V* ^ tt/ 2 is a parameter such that the 
points a(V’) = (/3 + a + 2icr)/((3 — a) all lie on the same ellipse £r, R > 1 fixed, with V’ 
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describing the position of a(ip) on £r (see (33) and (35)). The case ip = 0 corresponds to 
a symmetric positive definite matrix (74), and for our experiments, we have chosen R > 1 
such that A = Aq for ip = 0. Moreover, notice that with increasing ip, the symmetric part 
T of (74) becomes more and more indefinite and a = —(3 for ip = tt/2. Also, the shift cr 
increases with ip. Finally, we remark that the error bounds of Theorem 4 suggest that the 
MR and ME methods should display similar convergence rates for all ip. 

For our numerical examples, the mesh size h = 1/64 was used resulting in matrices 
(74) of dimension n = 3969. All the computations were done on a Cray-2. The exact 
solution z* was generated with random components in [—1,1] -f i[— 1,1], and then the 
right-hand side was set to b := Az*. As starting vector zo = 0 was chosen. As stopping 
criterion, we used 

I \b — Aik 1 1 < 10~ 6 1 16 — Ax 0 1 1 . (76) 

In the following tables, for several values of ip (stated in degree!) and the various CG 
type methods, we list the number of iterations which were necessary to reach (76). A 
indicates that the process still had not converged after 200 steps. In Table 7.1 the results 
for the MR, ME, and GAL Algorithms 2 resp. 3 (without preconditioning) are given. 
The Tables 7.2, 7.3, and 7.4 display the behavior of the three methods combined with 
the polynomial preconditioner (69) with l = 6,11, and 16, respectively. Also listed are 
the results for the ZPCG method consisting of the classical CG algorithm with Zolotarev 
polynomial preconditioner (65) (see Theorem 5). 


Table 7.1 


Table 7.2 


Table 7.3 


Table 7.4 


From these results, we draw the following conclusions. If used without preconditioning, the 
MR method appears to be superior to the ME and GAL approaches. However, note that 
the stopping criterion (76) is based on the norm of the residual, and this is more favorable 
for the MR method. A comparison based on the Euclidian norm of the error vector z* — z* 
displays a similar convergence behavior for the ME and MR approaches. In combination 
with polynomial preconditioning, the performance of all three methods PPMR, PPME, 
and PPGAL is nearly identical. Also, note that the polynomial (69) yields a very efficient 
preconditioner which reduces the number of iterations significantly in all examples. Finally, 
as already suspected in the previous section, the strategy leading to the ZPCG method is 


24 


a very dangerous one, and the algorithm even fails to converge if A is close to a positive 
definite matrix. 

We conclude this section with two further remarks. First, note that all the results for 
the PPMR, PPME, and PPGAL metods were obtained with right polynomial precondi- 
tioning (RPP) (cf. (60)). Experiments with left polynomial preconditioning (LPP) (see 
(59)) gave nearly identical results. However, since implementations of RPP are slightly 
more economical, we therefore recommend RPP over LPP. Finally, recall that for our tests, 
the true extreme eigenvalues (75) of T were used. Of course, in general, such information 
is not available. However, it is possible to obtain good estimates of these quantities after 
relatively few steps of the Lanczos Algorithm 1. 
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MR Algorithm 2 
ME/GAL Algorithm 3 
Orthodir Algorithm 4 

If T and r 0 are real: 

MR Algorithm 2 
ME/GAL Algorithm 3 


18n 

18n 

26n 


9n 

13n 


If A = T and ro are real: 

MINRES [30] 

SYMMLQ [30] 


Table 4.1 


8n 

8n 
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ME 
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215 

224 
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GAL 
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182 

198 

208 
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222 

225 
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xp / Degree 
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55 

60 

65 

70 

75 

80 

85 

90 

MR 
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224 
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232 
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239 

ME 
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237 
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245 
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259 

260 

263 

GAL 

236 

240 

244 

248 

253 

255 

259 

261 

264 
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47 

47 
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47 
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47 
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63 

47 

47 

47 

47 
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47 
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49 

49 

49 
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49 
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★ 

★ 
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99 
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